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Abstract 

Parton distribution functions (pdfs) are an important ingredient for LHC phe- 



Qj , nomenology. Recent progress in determining pdfs from global analyses is re- 

Q ' viewed, and some of the most important outstanding issues are highlighted. 

^ ■ Particular attention is paid to the precision with which predictions for LHC 

'standard-candle' cross sections can be made, and also to new information that 
LHC can provide on pdfs. 

I ' 1 Introduction 

QJ \ High-precision cross-section predictions for both Standard Model and Beyond Standard Model processes 

, ■ at the LHC require high-precision parton distribution functions (pdfs). In some cases, the uncertainty in 

our knowledge of the pdfs is a significant or even dominant part of the overall uncertainty in the theo- 
retical prediction. Of course, the more accurate the signal and background predictions, the easier it will 
be to detect new physics. Fortunately the LHC provides a number of 'standard-candle' processes, whose 
'n^" . measured cross sections can be used to check the theoretical framework (factorisation, DGLAP evolu- 

tion etc.). The paradigms are a{Z) and a{W), for which there are realistic prospects of experimental 
measurements and theoretical predictions accurate at the few % level. 

At the same time, the LHC can provide new information on the pdfs themselves. Hadron collider 
data have always been an important ingredient of pdf global fits. For example, fixed-target Drell-Yan 
data currently provide (unique) information on high-x sea quarks, Tevatron high-E^y jet data provide 
direct information on the high-x gluon, and Tevatron W and Z cross sections and distributions provide 
^ ' information on quark distributions complementary to that from deep inelastic scattering. There is every 

prospect that similar measurements at the LHC will improve our knowledge of pdfs even further. 

The basic theoretical tool for precision predictions for hadron colliders such as the Tevatron and 
the LHC is the QCD factorization theorem for short-distance inclusive processes: 



CTAB = J dXadXb fa/A{Xa,HF) fb/Bixb,iJ'F) X [ <To + a^C/"!) O"! + ••• 



ab-*X ■ (1) 



Formally, the cross section calculated to all orders in perturbation theory is invariant under changes in 
the factorization scale (fip) and the renormalization scale (fin), the scale dependence of the coefficients 
(To, o"! , ... exactly compensating the explicit scale dependence of the pdfs and the QCD coupling constant. 
This compensation becomes more exact as more terms are included in the perturbation series. In the 
absence of a complete set of higher-order corrections, it is necessary to make a specific choice for the 
two scales in order to make cross-section predictions. A variation of the scales by a factor of 2 around 
some 'natural' scale M for the process, i.e. M/2 < fip,^ji < 2M, is often use43 to characterise 



t Presented at the XXXVIII International Symposium on Multiparticle Dynamics (ISMD 2008), 15-20 September 2008, 
DESY, Hamburg, Germany. 

' Care should be taken when comparing scale uncertainties produced in this way. Some authors set fi — fip = f^R and vary 
/i in the standard range, while others either allow /i f and fiR to vary independently in the range, or place additional restrictions 

on r = fip/tJ-R, e.g. 1/2 < r < 2. 



the uncertainty from unknown higher-order terms in the series. The overall theoretical error on a cross 
section prediction can then be estimated as 6a^^ = Sa^^^ + Sa^^^. 

Almost all the theoretical quantities (subprocess cross sections, coefficient functions and splitting 
functions) that are needed for a global fit are nowadays known to NNLO in pQCD, and so this will be 
the de facto benchmark for LHC phenomenology. In some cases, e.g. W and Z production, electroweak 
corrections are also known and can be included. The following table illustrates the relative size of the pdf 
and scale uncertainties for some standard processes at the LHC, calculated at NNLcl in pQCD. Here the 
pdf uncertainties are taken from the recent MSTW global fit [1,2], while the scale uncertainty estimates 
for tt and Higgs production are taken from Refs. [3] and [4] respectively. Evidently the pdf uncertainty 
is a significant issue for Z and tt production, but not at present for Higgs production. 



process 






pp ^ Z + X 
pp^tt + X 
pp^ H{120 GeV) +X 


±2% 
±2% 
±2% 


±1% 

±3% 
±15% 



2 How pdfs are obtained 

The method by which pdfs are obtained from a global fit to a variety of 'hard scattering' data is by now 
well known - a schematic summary is shown in Fig.[I] A typical set of input data (as used, for example, 
by the MSTW and CTEQ collaborations, see Section[3]) is given in the following Table, together with the 
partons that they constrain. 



HI, ZEUS 


FfP{x,Q^),FfP{x,Q^) NC + CC 


BCDMS 


F,^^{x,Q'),Ff{x,Q') 


NMC 


(x, Q'), (x, Q'),Fr{x, Q')/F^^{x, Q^) 


SLAC 


FfPix,Q^),Ff''ix,Q^) 


E665 


F^^{x,Q^),F!^\x,Q^) 


CCFR, NuTeV, CHORUS 


F!^^'^''{x,Q%Ff^''{x,Q^) 




q, q at all x and g at medium, small x 


HI, ZEUS 


f£^{x,Q'),F^^,^{x,Q')^ c,b 


E605, E772, E866 


Drell-Yan pN ^ fip, + X ^ q{g) 


E866 


Drell-Yan p, n asymmetry u,d 


CDF, DO 


rapidity asymmetry u/d ratio at high x 


CDF, DO 


Z^ rapidity distribution u,d 


CDF, DO 


inclusive jet data ^ 5 at high x 


HI, ZEUS 


DIS ± jet data ^ at medium x 


CCFR, NuTeV 


dimuon data strange sea s, s 



Over the past 15 years, the quality and quantity of the data has improved enormously, so that 
nowadays the pdfs are known to very high accuracy, typically to within a few % over a broad range of 
X away from x = 0, 1. In terms of recent developments, much attention has been focused on the heavy 
quark (s, c, b) distributions. Until recently, the strange quark distribution was generally parametrised as 



s{x,Ql) 



s{x,Ql) 



u{x,Ql) + d{x,Ql] 



(2) 



with K = 0.4 — 0.5 suggested by (neutrino DIS) data. The suppression was understood as a non- 
perturbative mass effect. Recent measurements of dimuon production in vN DIS (for example, by 



In the case of tt production, an approximation to the (as yet uncalculated) full correction has been derived, see [3]. 



Formalism 

LO, NLO, NNLO DGLAP 
MSbar factorisation 

functional form @ Qg^ 
sea quark (a)symmetry 
etc. 



Data 

DIS (SLAC, BCDMS, NMC, E665, 
CCFR, CHORUS, HI, ZEUS, ... ) 
Drell-Yan (E605, E772, E866, ...) 
High E-rjets (CDF, DO) 
W rapidity asymmetry (CDF, DO) 
Z rapidity distribution (CDF, DO) 
vN dimuon (CCFR, NuTeV) 
etc. 
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Output 

FORTRAN, C++ code 
in user-friendly form 



Fig. 1 : Anatomy of a pdf global fit. 



CCFR and NuTeV) allow a more-or-less direct determination of both s and s, via 



dxdy dxdy 

in the range 0.01 < a; < 0.4. The data appear to slightly prefer s(x, Qq) / s{x, Qq), both having a 
different shape to the light sea quark distributions. Generalised parametrisations for s and s are therefore 
used in the most recent global fits. 

The charm and bottom quarks are considered sufficiently massive to allow a pQCD treatment, i.e. 
the distributions are assumed to be generated perturbatively via g QQ. Two regimes can be distin- 
guished: (i) ~ rriQ where it is essential to include the full mq dependence in order to get the correct 
threshold behaviour, and (ii) Q^ ^> rnq where the heavy quarks can be treated as essentially massless 
partons, with large logarithmic contributions of the form a^\n^{Q'^ /mQ) automatically resummed by 
the DGLAP equations. The so-called Fixed Flavour Number Scheme (FFNS), in which heavy quarks 
are not treated as partons, is only valid in region (i), whereas the Zero Mass Variable Flavour Number 
Scheme (ZM-VFNS), in which heavy quarks evolve as massless partons from zero at threshold, applies to 
region (ii) only. In recent years, a more general set of General Mass Variable Flavour Number Schemes 
(GM-VFNS) have been developed, with the advantage of interpolating smoothly and consistently be- 
tween the two Q^ regions, at a given order (up to and including NNLO in practice) in perturbation 
theory. The two most important points to note are: (i) the definition of a consistent GM-VFNS is tricky 
and non-unique (not least due to the assignment of ©(mg/Q^) contributions), and implementation of 
an improved treatment of heavy quarks can have a significant knock-on effect on light partons, and (ii) 
GM-VFNS predictions for the structure functions F^^^ and FI** agree well with measurements at HERA. 
A more detailed discussion of the treatment of heavy quark pdfs can be found in [5]. 

Another major advance in recent years has been the treatment of uncertainties in the distribution 
functions, and most global fit groups produce 'pdfs with errors' sets. These are of course useful in 
assessing the error on cross-section predictions due to the pdfs themselves. A typical package will consist 
of a 'best fit' set and ~ 30-40 error sets designed to reflect a itlcr variation of all the parameters used 
to define the starting distributions, as determined by the uncertainties on the data used in the global fit. 
However, in addition to these 'experimental' uncertainties, there are also uncertainties associated with 



theoretical assumptions and/or prejudices in the way the global fit is set up and performed. Although 
these are generally more difficult to quantify, they are often the main reason for the differences between 
the sets produced by different groups. The following is a non-exhaustive list of the reasons why 'best fit' 
pdfs and errors can differ: 

• different data sets in the fit: 

— different subselection of data 

— different treatment of experimental systematic errors 

• different choice of: 

— pQCD order (in DGLAP and cross sections) 

— factorisation/renormalisation scheme/scale 

-Ql 

— parametric form fi{x, Qq) = Ax"-{1 — x)^c(x) etc., and implicit extrapolation 

— as 

— treatment of heavy flavours 

— theoretical assumptions about x — > 0, 1 behaviour 

— theoretical assumptions about sea quark flavour asymmetry 

— tolerance to define ±(5/j 

— evolution, cross-section codes, rounding errors etc. 

Note that these can apply both to comparisons of the type CTEQ vs. MRST vs. ... and to CTEQ6.1 vs. 
CTEQ 6.5 etc. 

3 Recent progress 

There are a number of groups producing pdf sets from global fits to data. In this section we give a very 
brief summary of these, with references to where more information can be found. 

The Martin-Stirhng-Thorne-Watt MSTW (formerly Martin-Roberts-Stirhng-Thorne MRST) 
collaboration produces sets at LO, NLO and NNLO using a 'maximal' set of fitted data as described 
in the previous section. The previous MRST2006 sets [6] contained an update of the NNLO fit to in- 
clude both pdf errors and an improved GM-VFNS treatment of c and b. The new MSTW2008 sets [1,2] 
include (i) new data sets in the fit (CHORUS and NuTeV neutrino data and HERA DIS+jet data), (ii) 
a more sophisticated treatment of s and s in which both are allowed to have independent shapes and 
normalisations, and (iii) an improved treatment of the tolerance procedure to define the error sets (for a 
summary see [1]). 

The CTEQ collaboration (Ref. [7] and references therein) produces LO and NLO pdf sets from 
global fits using roughly the same maximal data set as MSTW/MRST. Earlier this year, the previous 
(2006) 6.5 set was updated to produce set 6.6. CTEQ 6.5 was characterised by the first implementation 
of a GM-VFNS (the 'SACOT— x' scheme [8,9]), which had a significant impact on the c and b distribu- 
tions, a compensating impact on the u and d partons, and a corresponding change in the predictions for 
a{W, Z). The new 6.6 set has a more sophisticated treatment of the s and s pdfs, allowing these to have 
a more general shape and normalisation than previously. The impact of an additional 'intrinsic charm' 
contribution is also studied. 

Given the similarity of the data fitted and the theoretical framework used, it is no surprise that the 
pdf outputs from the MSTW and CTEQ analyses are similar. This is illustrated Fig. |2j which compares 
the latest MSTW2008 and CTEQ6.6 NLO u and g pdfs (with errors) at = iq^ GeV^. Note that the 
broader CTEQ error band is in part a reflection of a different choice of tolerance in defining the allowed 
range of Ax^. The MSTW gluon is smaller at small x, because the parameterisation at Qq = 1 GeV^ 
allows the starting distribution to be negative at small x, unlike in the CTEQ (central) fits where the gluon 
is always constrained to be positive. 

Alekhin et al. produce sets at LO, NLO and NNLO. The original 2002 (Alekhin) set [10] was up- 



Up quark at = 10" GeV^ Gluon at = 10" GeV^ 




Fig. 2: Comparison of recent MSTW and CTEQ up quark (left) and gluon (right) NLO parton distributions. 

dated first in 2006 [11] (Alekhin-Melnikov-Petriello) and again in 2007 [12] (Alekhin-Kulagin-Petti). 
The 2002 fit was based on DIS structure function data only (SLAC, BCDMS, NMC, E665, HI, ZEUS). 
The 2006 AMP version added E605 and E866 Drell-Yan data, and CHORUS, CCFR and NuTeV neu- 
trino structure function and dimuon data. Unlike the CTEQ and MSTW/MRST fits, the Alekhin fit does 
not include Tevatron high-E'j' jet data, nor a complete GM-VFNS treatment of heavy quarks, and this 
accounts for much of the differences between the resulting parton distributions. 

Both the HI and ZEUS collaborations have produced pdf sets in the past based on their own 
HERA DIS data supplemented by other DIS data. The most recent HI (2003) set added BCDMS data 
to HI structure function data to give a broad coverage in x and Q^. The ZEUS (2005) set was based 
on ZEUS data (both inclusive structure function and DIS+jet data) only. The two collaborations also 
had different treatments of pdf errors: offset (ZEUS) vs. Hessian (HI). Recently HI and ZEUS have 
joined together to produce a combined pdf set, HERAPDFO. 1, details of which can be found in the talk by 
Gang Li [13]. Differences between the previous HI and ZEUS fitting procedures have been resolved, and 
experimental and model uncertainties have been carefully considered. However this fit uses only HERA 
inclusive cross section NC and CC e^p data, and therefore there are small but significant differences 
in both quark and gluon differences in comparison with MSTW and CTEQ, which can in large part be 
traced to the influence of Tevatron and fixed-target Drell-Yan data in the latter global fits. 

The NNPDF (Neural Net) collaboration [15] uses neural net technology in the fit to avoid having 
to choose a particular parametric form at Qq. The new (NLO) set, NNPDFl.O, is based on a Monte 
Carlo approach, with neural networks used as unbiased interpolants. The method is designed to provide 
a faithful and statistically sound representation of the uncertainty on parton distributions. The fit is 
performed to a restricted 'DIS only' data set in a ZM-VFNS scheme for the heavy quarks. The absence 
of Drell-Yan and neutrino dimuon data from the fit means that the detailed flavour structure of the quark 
sea is not well determined (and therefore neither are the predictions for W and Z cross sections at the 
LHC, see Section |4]below). The absence of Tevatron High-£^T jet data from the fit is another signficant 
source of difference between NNPDF and CTEQ/MSTW. A recent update (NNPDF 1.1 [16]) introduces 
independent parametrisations for the strange pdfs. 

Finally, there have been a number of other studies of pdfs designed for particular purposes or 
to investigate different theoretical frameworks. For example, the 'dynamical parton model' approach 
(see [14] and references therein) attempts to describe DIS and other data from a set of valence-like 
partons evolved upwards in from a low starting scale. A reasonable fit is obtained, although the total 
is signficantly larger than in a (standard) fit in which the small-x parameters are unconstrained. 
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Fig. 3: Standard Model cross section predictions at hadron-hadron colliders (left), and the parton x, region probed by the 
production of a heavy object of mass M and rapidity y at the LHC (right). 

4 Parton distributions at the LHC 

There are a number of LHC standard-candle processes, a{W^, Z^, ft, jets, ...), that can be used to probe 
and test pdfs, typically in the range x 

10"2±i^ q2 ^ QgY2 (-ggg Fig. [31), which is where 

most New Physics signals (Higgs, SUSY, etc.) are expected. The total W and Z cross sections provide 
a particularly important point of comparison between the various pdf sets. A number of factors are 
relevant, including (i) the rate of evolution from the of the fitted DIS data to ~ 10^ GeV^, driven 
mainly by as and the gluon distribution, and (ii) the mix of quai^k flavours, since F2 and a{W, Z) probe 
different combinations of quark flavours. A very precise measurement of cross section ratios at LHC 
(e.g. a{W~^)/a{W~) and a{W^)/a{Z)) will allow these subtle quark flavour effects to be explored. 

By way of example, we show in Fig.|4]a selection of predictions for a{W^) and a{Z) LHC cross 
sections [2]. The error eUipses correspond to the MSTW2008 NLO and NNLO pdf sets. Note that 
the cross section ratios are determined more precisely than the absolute cross sections themselves. In 
the case of the /W~ cross section ratio, the overall uncertainty is of order 1%, and comes mainly 
from the uncertainty in the u/d ratio at the relevant x and values. Note that the change in the 
cross sections going from MRST2004 to MRST2006 is due to an improvement in the heavy flavour 
prescription [6] discussed earlier, which mainly affects the charm distribution, while the predictions are 
relatively stable in going from MRST2006 to MSTW2008. The CTEQ6.6 and CTEQ6.5 predictions are 
very similar, but both are significantly higher than the CTEQ6.1 predictions. Again, this is mainly due 
to a different treatment of s, c, b quarks in the fit. The CTEQ6.6 LHC predictions are about +2% higher 
than MSTW2008, because of slight differences in the quark {u,d,s,c) distributions, but overall the 
predictions agree reasonably well within the quoted uncertainties. Care is however needed in comparing 
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Fig. 4: VK^ vs. Z (left) and vs. W (right) total cross sections at the LHC calculated using various past and present pdf 
sets at NLO and NNLO. The la error ellipses are shown for the MSTW 2008 NLO and NNLO pdfs. 



predictions based on different orders of QCD perturbation theory (NLO, NNLO, NNLL-NLO, ...), since 
the higher-order contributions to the cross sections are not negligible. 

The error ellipses on the MSTW W and Z predictions come from the new 'dynamical tolerance' 
treatment of pdf uncertainties described in [1]. There is an additional uncertainty of the same size from 
scale variation (quantified in the usual way by varying the scales from M/2 to 2M). Combining these, 
we predict a total ('Icr') uncertainty of ~ ±4% on the total W and Z cross sections at LHC, and these 
could therefore be useful in calibrating the machine luminosity. A more complete discussion of the role 
of higher-order corrections in cross-section predictions at the LHC can be found in Refs. [17, 18]. 

It is clear from Fig. [3] that in order to probe very small x at the LHC we need to produce relatively 
light objects at forward rapidity, since then x ~ {M/ ^/s) exp(— y) <C 1. The simplest process to use 
for this purpose is Drell-Yan (DY) lepton pair production. At the LHC this requires good detection of 
low pt leptons in the forward region. Interestingly, this is precisely the region that will be accessible to 
LHCb [19]. Translating the detector acceptance for muon pairs into the (x, Q'^) plane gives the 'LHCb' 
region shown in Fig. [H There are two main impacts of such a measurement: (i) quark distributions can 
be measured in the perturbative domain at smaller x values than at HERA, and (ii) DGLAP evolution 
over 1-2 orders of magnitude in Q"^ can be tested by comparing pdf measurements at the same (small) x 
value at HERA and LHCb. Detailed studies are underway to quantify the improvement in pdf precision 
at small x resulting from the inclusion of such LHCb data in the global fit. 



5 Summary 

In the past few years there has been progress in our understanding of parton distribution functions, and 
convergence of the various approaches used to determine them. The main distinguishing features of the 
currently available 'precision' pdf sets are (i) how heavy quarks are treated, (ii) how the tolerance for 
determining pdf error sets is defined, and (iii) whether the Tevatron high— jet data are included in the 
fit. If they are, then the high-x gluon is slightly larger than the gluon derived from fits which are based 
on structure function data only. In the context of nfull NNLO global pdf analysis, the NNLO {0{a'g)) 
corrections to the high— E'j' jet cross section are still the most important missing ingredient, although 
their quantitative impact on the current partial-NNLO analysis is not expected to be large. 
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Fig. 5: The parton {x,Q^) region probed by Drell-Yan lepton pair production in LHCb (left), and the ratio of ^ qq, gg and 
GG (where G = g + 4/9 ^(g + q)) parton luminosities at 10 TeV and 14 TeV LHC (right). 

The situation regarding the treatment of heavy quark flavour (c, b) distributions is now quite sat- 
isfactory, with GM-VFNS generally accepted as the correct procedure. Within this framework, there 
is good agreement with HERA data on and i^f. However, it is important to remember that pQCD- 
generated heavy flavour distributions may not be the whole story. The issue of additional intrinsic heavy 
flavour contributions, dominant at high x where the structure function data are sparse, is stiU an open 
question. 

Early data from the LHC will be important for benchmarking a number of Standard Model standard- 
candle cross sections. In the case of a{W) and (t{Z), the (NNLO) cross sections are predicted to approx- 
imately ±4% [2]. Note that such cross sections are not much smaller at y/s = 10 TeV energy, since 
they tend to sample small-x partons that are not changing rapidly with x. This is illustrated in the 
right-hand figure in Fig. [51 which shows the ratio of the parton luminosities at 10 TeV and 14 TeV for 
X] (relevant for W, Z, etc. production), gg (relevant for Higgs, tt etc. production), and GG (with 
G = g + 4/9X]g(9 + relevant for high— E'y dijet production) initial states. 

Looking further ahead, a number of LHC measurements have the potential to constrain the pdfs 
even further. The most interesting appears to be the cross section for relatively low-mass Drell-Yan 
lepton pairs produced at large rapidity, which may be able to provide information on quark distributions 
at very small x ~ 10~^ — 10^®, outside the domain currently accessible at HERA. The LHCb detector 
appears well suited to this measurement. 
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